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J  An  algorithm  is  presented  for  the  rapid  calculation  of  the  values  and  coefficients  of  finite 
Legendre  series.  Given  an  n-term  Legendre  expansion,  the  algorithm  produces  its  values  at  n 
Chebyshev  nodes  on  the  interval  [-1,1]  for  a  cost  proportional  to  nlogn.  Similarly,  given  the  values 
of  a  function  /  at  n  Chebyshev  nodes,  the  algorithm  produces  the  n-term  Legendre  expansion  of  the 
polynomial  of  degree  n  —  1  that  is  equal  to  /  at  these  nodes.  The  cost  of  the  algorithm  is  roughly 
3  times  that  of  the  fast  Fourier  transform  of  length  n,  provided  that  calculations  are  performed  to 
single  precision  accuracy.  In  double  precision,  the  ratio  is  approximately  5.5. 

The  method  employed  admits  far-reaching  generalizations  and  is  currently  being  applied  to 
several  other  problems.  ■  ^ 
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1  Introduction 


Legendre  polynomials  are  widely  used  in  applied  mathematics.  Their  applications  include 
quadratures,  approximation  theory,  solution  of  partial  differential  equations,  analysis  of 
pseudospectral  methods,  and  severed  other  areas.  However,  attempts  to  use  Legendre 
polynomials  as  a  numerical  tool  (as  opposed  to  an  analytical  apparatus)  tend  to  meet 
with  a  serious  difficulty:  given  a  function  /  :  [—1,1]  — ►  R  tabulated  at  n  nodes,  it 

takes  order  0(n2)  operations  to  obtain  the  Legendre  expansion  of  /.  Similarly,  given  an 
n-term  Legendre  expansion,  at  takes  order  0(n2)  operations  to  evaluate  that  expansion 
at  n  nodes  in  R.  In  other  words,  unlike  the  Chebyshev  expansion  or  the  Fourier  series, 
the  Legendre  series  does  not  have  a  fast  transform  associated  with  it.  Whenever  possible, 
therefore,  the  Legendre  series  is  avoided  in  favor  of  an  expansion  for  which  a  fast  transform 
exists.  In  some  cases,  this  substitution  causes  relatively  little  difficulty  (for  example, 
in  the  construction  of  pseudospectral  algorithms  for  the  solution  of  partial  differential 
equations).  In  other  cases,  it  cannot  be  done  at  all  (for  example,  in  the  solution  of  partial 
differential  equations  by  the  separation  of  variables  in  the  spherically  symmetric  geometry, 
where  the  choice  of  Legendre  polynomials  as  the  set  of  basis  functions  is  dictated  by  the 
mathematics  of  the  problem). 

In  Orszag  [8],  a  method  is  proposed  for  the  rapid  evaluation  of  a  fairly  wide  class  of 
eigenfunction  transforms.  The  algorithm  is  based  on  the  combination  of  certain  analytical 
considerations  with  the  fast  Fourier  transform,  has  the  asymptotic  CPU  time  estimate 
of  order  n(logn)2/loglogn,  and  becomes  faster  than  the  direct  (order  n2)  algorithm  at 
n  ss  128  (in  the  case  of  the  Legendre  series). 

In  this  paper,  we  present  a  procedure  for  the  rapid  evaluation  of  a  Legendre  expansion 
at  Chebyshev  nodes  on  the  interval  [—1,1],  and  conversely,  for  the  evaluation  of  the 
coefficients  of  a  Legendre  expansion  from  a  table  of  its  values  at  Chebyshev  nodes.  More 
specifically,  given  a  function  /  expressed  as  Legendre  expansion  of  the  form 

/«)=!.,%  (i) 

>» o 

the  algorithm  evaluates  /  at  the  n  Chebyshev  nodes  •  •  •  ,  *n-i  on  the  interval  [—1, 1] 
in  order  O(nlogn)  operations.  Similarly,  given  the  values  of  a  function  /  :  [—1,1]  — ►  R 
tabulated  at  the  nodes  t0,ti,. . . , tn-i,  the  algorithm  requires  order  O(nlogn)  operations 
to  evaluate  the  coefficients  a0,  au, . . . ,  an_i  such  that 


n— 1 


i= 0 


for  t  =  0, 1, . . .  ,n  —  1. 


The  algorithm  we  present  is  based  on  replacing  the  Legendre  expansion  of  the  form 
(1)  with  a  Chebyshev  expansion  of  the  same  length,  with  subsequent  evaluation  of  the 
latter  via  the  fast  cosine  transform  (see,  e.g.,  [3]  Chap.  10).  It  turns  out  that  the 
reduction  of  a  Legendre  expansion  to  a  Chebyshev  expansion  can  be  performed  in  order 
O(n)  operations,  and  it  well  known  that  the  cosine  transform  of  length  n  requires 
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QQ 


order  O(nlogn)  operations.  Thus,  the  resulting  CPU  time  estimate  of  our  algorithm  is 
O(nlogn). 

Remark  1.  While  the  asymptotic  CPU  time  estimate  of  our  scheme  is  dominated  by  that 
of  the  fast  cosine  transform  it  employs,  in  most  practical  situations  (n  <  20, 000),  the 
conversion  of  the  Legendre  series  of  length  n  into  a  Chebyshev  series  tends  to  be  roughly 
twice  as  expensive  as  one  fast  Fourier  transform  of  length  n,  provided  the  calculations 
are  performed  in  single  precision  arithmetic.  As  a  result,  evaluating  the  series  of  the 
form  (1)  at  n  Chebyshev  nodes  by  means  of  our  algorithm  is  approximately  three  times 
as  expensive  as  a  single  FFT  of  length  n.  In  double  precision  arithmetic,  this  ratio  is 
roughly  5.5  (see  Section  6  below). 

In  the  following  section,  we  summarize  several  well-known  facts  from  approximation 
theory  to  be  used  in  the  subsequent  sections.  In  Section  3,-  the  analytical  properties  of 
the  linear  mappings  connecting  the  coefficients  of  Legendre  and  Chebyshev  expansions 
are  studied.  Sections  4  and  5  contain  the  description  of  the  algorithm  and  its  complexity 
analysis,  and  in  Section  6  results  of  several  numerical  experiments  are  presented.  Finally, 
Section  7  discusses  several  straightforward  generalizations  of  the  algorithm  of  this  paper. 

The  paper  has  two  appendices.  Appendix  A  presents  details  of  an  efficient  scheme  for 
the  evaluation  of  the  function  T(x  +  |)/r(x  + 1),  required  by  our  algorithm.  In  Appendix 
B  we  sketch  an  alternative  algorithm,  also  implemented,  which  is  competitive  with  the 
main  algorithm  presented. 

Remark  2.  The  approach  of  this  paper  is  closely  related  to  that  used  in  [9]  to  construct 
an  order  O(n)  scheme  for  the  evaluation  of  a  polynomial  of  order  n  at  n  arbitrary  points 
in  Bt.  Both  algorithms  can  be  viewed  as  particular  implementations  of  a  scheme  for  the 
rapid  application  to  arbitrary  vectors  of  matrices  whose  entries  are  sufficiently  smooth 
functions  of  their  indices.  Such  a  general  procedure  is  discussed  in  Section  7,  and  will  be 
reported  in  detail  at  a  later  date. 


2  Mathematical  and  Numerical  Preliminaries 


2.1  Miscellaneous  Facts  from  Approximation  Theory 

In  this  section  we  summarize  several  well-known  facts,  the  first  being  the  classical  error 
bound  for  Chebyshev  approximations  (see,  e.g.,  [4]). 

Lemma  1.  Suppose  k  >  2,  and  let  t,-  denote  the  ith  Chebyshev  node  of  order  k  on  [0,1] 
and  Ui(t )  the  ith  Lagrange  polynomial  associated  with  the  ti’s,  i.e., 


U  = 


u,(t) 


n 


j=0,j#i 


t-tj 
U  ~  tj  ’ 


(3) 


for  i  =  0,  1, . . . ,  k  —  1.  Suppose  further  that  f  :  [a,  b]  — »  R  is  a  function  with  k  continuous 
derivatives,  and  that  the  error  E(f,k,[a,b\\t)  of  the  k-node  Chebyshev  expansion  for  f 


2 


(4) 


at  a  point  t  €  [a,  b]  is  defined  by  the  formula 

E(f ,  *,  [a,  6);  ()  =  /(f)  -  £  u(  •  /(«  +  (6  -  <,)(,). 

Then  for  any  t  €  [a,  6], 

\E(f,  k,  (a,  6);  t)|  <  sup  |/>‘>(u)|.  (5) 

Throughout  the  paper  we  will  retain  the  notation  of  ti  for  Chebyshev  nodes,  ut(t)  for  the 
corresponding  Lagrange  polynomials,  and  E(f,  k,  [a,  6];  t)  for  the  interpolation  error. 

Next  we  provide  a  bound  on  the  derivatives  of  an  analytic  function,  which  follows 
directly  from  the  Cauchy  integral  formula. 

Lemma  2.  Suppose  that  D  C  C  is  a  closed  disk  of  radius  r  centered  at  z  €  C,  and  that 
f  :  D  — ►  C  is  a  function  continuous  on  D  and  analytic  on  its  interior  D\dD.  Then 

<  g  sup  |/(*  +  re'»)|.  (6) 

r  «€[0,2»] 


The  next  lemma  provides  an  expression  for  the  logarithm  of  the  gamma  function  (see, 
e.g.,  [7],  p.  10)  which  we  use  in  Lemma  4  below. 

Lemma  3  (Binet).  For  any  z  €  C  such  that  Re(z)  >  0, 


In  T(z)  =  (z  -  1/2)  ln(z)  —  z  +  ln(27r)/2  +  J(z), 

(7) 

where 

J(z)  =  r  -e~udt. 

w  Jo  \e*  - 1  t  2 J  t 

(8) 

Furthermore, 

*  icp 

(9) 

We  define  a  function  A  :  C  — ►  C,  by  the  formula 

AW  =  r(s  +  i/2) 
v  ’  r(z  +  i) 

(10) 

The  function  A  will  often  appear  in  the  remainder  of  the  paper.  The  following  lemma 
states  specific  bounds  on  |A(z)|  that  we  will  need  in  Section  3. 

Lemma  4.  Suppose  that  z  6  C  and  Re(z)  >  0.  Then 


e-1/2 

r - 7  <  |A(z)| 

\J \z  +  1| 

(ii) 

CLTld 

* 

|A(*)I  <  -Jr—- 
\/\2  + 1 

(12) 
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Proof.  Combining  Binet’s  expression  for  lnT  (Eq.  7)  with  the  definition  of  A  (Eq.  10), 
we  obtain 

In  A (*)  =  F(z)  +  |(1  -  ln(z  +  1))  +  <?(*),  (13) 

with  the  function  F  :  <D  — ►  <D  defined  by  the  formula 

FW  =  *ln(^),  (14) 

and  the  function  Q  :  <D  — *  C  defined  by  the  formula 

Q(z)  =  I(z  +  1/2)  -  I(z  +  1),  (15) 

where  I(z)  is  given  by  Eq.  (8).  Combining  the  estimate  (9)  for  |I(z)|  with  Eq.  (15),  we 
see  that 

|He(<?(z))|  <  1  (16) 

for  any  z  with  Re(z)  >  0,  and  simple  analytical  manipulations  show  that 

-  1  <  Re(F(*))  <  0  (IT) 

for  any  z  with  Re(z)  >  0.  Now,  combining  Eq.  (13)  with  estimates  (16)  and  (17),  we 
obtain 

<  Re(ln  A(z))  -+-  ~ In  |z  -+- 1 1  <  1, 
from  which  (11)  and  (12)  follow  immediately.  | 


2.2  Definition  of  Legendre  and  Chebyshev  Polynomials 

An  orthogonal  family  of  polynomials  <pi,  <p2, . . . ,  is  a  set  of  polynomials  of  degrees 
0, 1, 2, . . . ,  in  which  the  inner  product  (<£,-,  <pj)  is  zero  if  i  /  j  and  positive  if  i  =  j.  The 
inner  product  is  defined  by  the  formula 


(/,£)=  /  f(i)g(t)w(t)dt, 
J  a 


where  [a,  i]cl  and  the  weight  function  w  is  continuous  and  non- negative  on  [a,  b ]. 

The  Legendre  and  Chebyshev  polynomials  each  form  an  orthogonal  family  of  poly¬ 
nomials.  In  each  case  the  inner-product  integral  is  taken  over  the  interval  [—1,1];  for 
the  Legendre  polynomials,  the  weight  function  is  w(t)  =  1,  while  for  the  Chebyshev 
polynomials,  w(t)  =  1/y/l  —  t2. 

Any  orthogonal  polynomial  family  satisfies  a  three-term  recurrence  relation  (for  n  >  0) 
of  the  form 

V?_!(<)  =  0,  <p0(t)  =  Ao, 


-T^n+l  (t)  = 

*An+l 


(t<Pn,V>n) 
(^n,9n)  . 


¥>»(<)  “ 


(y?n,fy?n- 1) 

(^n-l,  Pn-l) 


(18) 
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where  An  ^  0  is  the  leading  coefficient  of  <pn(t)  and  can  be  chosen  arbitrarily.  For  the 
Legendre  polynomials  P0,  Pi,  P2, . . . ,  Eq.  (18)  takes  the  form 

P0(t)  =  1,  Pi(t)  =  t , 

a+.M  =  Trr‘  p.M  -  rr7p"->(()'  <n  *  *)•  (i») 

n  +  1  n  +  1 

For  the  Chebyshev  polynomials  To,  Ti,  T2, . . . ,  Eq.  (18)  becomes 


T0(t)  =  1,  Ti(f)  =  t, 

Tn+X(t)  =  2<  Tn(t)  -  Tn_a(<),  (n  >  1). 

In  addition  to  the  above  recurrences  which  define  Pn  and  Tn  (n  —  0, 1, 2, . . .),  equivalent 
closed-form  expressions  are  available.  The  Legendre  polynomials  can  be  given  by  the 
equation 

and  the  Chebyshev  polynomials  by  the  equation 

Tn(t)  =  cos(naxccost),  (n  >  0). 

Everything  in  this  section  can  be  found  in  standard  texts  (see,  e.g.,  [4]  Sec.  4.4). 


2,3  Connection  between  Legendre  and  Chebyshev  Expansions 

We  will  denote  by  Mn,  Ln  a  pair  of  n  x  n-matrices  defined  by  the  formulae 


Mn 


(%\  if  0  =  i  <  j  <  n  and  j  is  even 

‘  (^)  A  (^)  if  0  <  i  <  j  <  n  and  i  +  j  is  even 

„  0  otherwise 


1 


< 


2A(.) 

,  zAi±m 
0 


if  i  =  j  =  0 
if  0  <  i  =  j  <  n 

A  ( 3~'~ 2)  A  (J+2r~  )  if  0  <  i  <  j  <  n  and  i  +  j  is  even 

otherwise, 


(20) 


(21) 


with  A  defined  by  Eq.  (10). 

Remark  3.  While  Eqs.  (20)  and  (21)  define  A/,”  and  for  integer  values  of  i,  j,  it  is 
apparent  from  Eq.  (10)  that  Mn  and  Ln  can  be  naturally  viewed  as  functions  on  €  x  C, 
and  we  define  M,C  :  C  x  C  — ►  C  by  the  formulae 


M(x,y) 

c(x,y) 


~y(x  +  1/2)  ,  fy-x-2\  fy  +  x-  1\ 

(y  +  x  +  l)(y  -  1)  V  2  J  \  2  J 


(22) 

(23) 
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Clearly,  M,"  =  M(i,j)  if  0  <  i  <  j  <  n  and  i  4-  j  is  even.  Similarly,  L™-  =  £(i,j)  if 
0  <  i  <  j  <  n  and  i  +  j  is  even.  It  easily  follows  from  the  well-known  properties  of  the  T- 
function  (see,  e.g.,  [1])  that  M,  C  Eire  meromorphic  functions  of  each  of  their  arguments 
with  the  poles  of  M  located  at  the  points  y  =  ±x  —  1,  ±x  —  3,  ±x  —  5, . . .  and  the  poles 

of  C  at  the  points  y  =  x  +  1,  —  x,  x  —  1,  —  x  —  2, _ 

The  matrices  Mn ,  Ln  are  inverses  of  each  other;  their  definition  is  motivated  by  the 
following  well-known  fact  (see,  e.g.,  [5],  §8.91-2): 

Lemma  5.  Suppose  that  the  function  f  :  [—1,  lj  — »  R  has  a  finite  Legendre  expansion  of 
the  form 

n— 1 

/(cos  0)  =  ^2  a,  ■  P,(cos  6).  (24) 

t=0 

Then  it  also  has  a  finite  Chebyshev  expansion  of  the  form 

/(cos0)  =  £Vt;(cos0),  (25) 

i=0 

where  a  =  (ar0, . . . ,  an-i)  and  0  =  (0O,  •  •  • ,  0n-i)  are  related  by  the  equation 

0  =  Mna.  (26) 

Conversely,  if  f  is  a  function  given  by  Eq.  (25),  then  it  may  be  expressed  in  the  form  of 
Eq.  (24),  where  a  is  given  by 

a  =  Ln0.  (27) 


3  Analytical  Properties  of  the  Mappings  At,  C 

Definition  1.  Suppose  that  a  square  S  C  RxR  is  defined  by  the  formula  S  =  [ro,xo+a]x 
[yo,yo  +  a],  where  a  >  0.  We  will  say  that  S  is  separated  from  the  diagonal  ifyo  —  Xo  >  2a. 


The  following  theorem  is  the  principal  analytical  tool  of  this  paper.  It  states  that  on  any 
square  separated  from  the  diagonal,  the  entries  of  Mn  and  Ln  are  well  approximated  by 
Chebyshev  expansions  of  the  functions  M ,  £  with  respect  to  either  the  first  or  the  second 
coordinate.  For  any  x  €  C  we  will  define  a  pair  of  functions  M-x  and  £f  by  the  formulae 


M-X{y)  =  M(x,y) 


for  all  y  €  C. 
formulae 


for  all  i£C. 


CM  = 

X  +  2 

Similarly,  for  any  y  £  C  we  will  define  the  functions  and  by  the 


M?(x)  —  M(x,  y) 
£*(*)  = 


* 
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Theorem  1.  Suppose  the  square  S  =  [x0,x0  +  a]  x  [y0,y0  +  a],  a  >  2,  is  separated  from 
the  diagonal  and  ( x ,  y )  6  S.  Then 


* 


I 


and 


Similarly, 


and 


\E(Mv,k,[x0,x0  +  a];x)|  <  ^-|.M(x,y)| 


\E(MX,  k,  [y0,  yo  +  a];  y)\  <  ^-| M(x,  y) |. 


I E(Cy,  k,  [x0,  x0  +  a];  x)|  < 
\E(£X,  k,  [y0,  yo  +  a};  y)|  < 


128e3 

3* 

224e3 

3k 


£(x,y) 


x  +  i 

£(x,y) 


x  + 1 


where  E  is  the  error  for  Chebyshev  interpolation,  as  defined  in  Eq.  (4). 


(28) 

(29) 

(30) 

(31) 


Proof.  We  will  prove  here  only  the  estimate  (28),  since  the  proofs  of  all  four  state¬ 
ments  (28),  (29),  (30),  (31)  are  quite  similar.  In  order  to  prove  (28),  we  will  prove  two 
inequalities 


\E(MV,  k,  [x0,  x0  +  a];  x)|  < 


_ 32e'i/x _ 

3k\J(y  -  x  +  2)(y  +  x  +  2) 


(32) 


\M(x,y)\  > 


4e~1/7r 

\J (y  ~  x  +  2)(y  +  x  +  2) 


(33) 


for  all  (x,y)  €  S.  and  observe  that  estimate  (28)  is  an  immediate  consequence  of  (32) 
and  (33). 


a)  Proof  of  inequality  (32).  In  order  to  apply  the  error  estimate  (5)  of  Lemma  1  to  the 
function  Mv ,  we  will  need  a  bound  on  the  derivatives  of  My.  First,  we  establish  a  bound 
on  My(x  +  | (y  —  x)z,e)  for  any  (x,  y)  €  S  and  9  E  [0,  27t],  It  immediately  follows  from 
the  definition  of  M.  (Eq.  22)  combined  with  estimate  (12)  that 


Mv  (x  +  ^(y  -  x)e’^  =  (x  +  ^(y  -  x)e'9 ,  y^j 

■  ^lA  OHr  I)e")  I  •  I' A  +  !<"  - 1>e"’)  I 


T  %/[(y  -  *)/8  + 1]  •  [(y  +  7x)/8  + 1] 


(34) 


for  any  6  €  [0,  2tt].  Combining  estimate  (34)  with  the  Cauchy  integral  estimate  (6)  and 
noting  that  y  —  x  >  a,  we  obtain 


dkMy 

dxk 


< 


k\  16e2/7r 

(3a/4)fc  \J{y  -  x  -(-  2)(x  +  y  +  2) 


(35) 
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Now  (32)  follows  from  a  combination  of  (35)  with  estimate  (5). 

b)  Proof  of  inequality  (33).  The  lower  bound  on  M.  is  easily  established  by  combining 
the  definition  of  M.  (Eq.  22)  with  estimate  (11),  which  yields 


Afy:x) 

■  A(y  +  I) 

\  2  J 

V  2  J 

2 _ e-> _ 

*  y/Kv  -  x )/2  +  1]  •  [(y  +  *5/2  +  1]  ’ 


for  any  (x,  y)  £  S,  and  inequality  (33)  immediately  follows.  | 

Remark  4-  Estimates  (28),  (29),  (30),  and  (31)  in  Theorem  1  are  quite  pessimistic. 
Numerical  experiments  indicate  that  the  errors  in  those  estimates  all  decay  approximately 
as  5~k,  as  opposed  to  3~k.  In  fact,  8-point  Chebyshev  expansions  will  approximate  M. 
and  C  with  roughly  single  precision  accuracy  (7  digits)  on  any  square  separated  from 
the  diagonal.  Similarly,  double  precision  (16  digits)  is  achieved  with  18-point  expansions. 
For  our  purposes,  however,  the  estimates  of  Theorem  1  are  adequate. 


4  Informal  Description  of  the  Algorithm 

We  now  define  a  concept  closely  analogous  to  the  separation  of  a  square  from  the  diagonal. 

Definition  2.  Suppose  A  is  an  upper-triangular  n  x  n-matrix  with  entries  {A,,},  i,j  = 
0, 1, . . . ,  n  —  1.  Suppose  further  that  T  is  an  mx  m-submatrix  of  A  defined  by  the  formula 

Tij  —  Ap+,i?+j 

with  p,  q  two  non-negative  integers.  We  will  say  that  the  submatrix  T  of  matrix  A  is 
separated  from  the  diagonal  if 

q  —  p>  2m. 


t 


4.1  A  Simple  Example 

Suppose  that  T  is  an  m  x  m-submatrix  of  matrix  Mn,  and  that  T  is  separated  from 
the  diagonal.  We  present  a  simple  example  of  how  the  smoothness  of  Mn  enables  us  to 
efficiently  compute  w  =  Tv,  where  v  =  (vq,  . . . ,  um_i)  is  an  arbitrary  vector  of  length  m. 
To  compute 

m  —  1 

tu,  =  M(io  +  I,  Jo  +  j)  •  Vj  for  i  =  0, . . . ,  m  -  1,  (36) 

j- o 

we  may  approximate  Ai(io  +  i,jQ  +  j)  by  its  Chebyshev  expansion  in  j.  We  have 


k- 1 

M(io  +  Jo  +  j)  «  M(*o  +  bio  +  trm)  •  uT 

r=0 


(37) 
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where  we  know  from  Theorem  1  that  as  k  grows,  the  error  of  this  approximation  shrinks 
as  3-fc.  Substituting  (37)  into  Eq.  (36)  and  changing  the  order  of  summation,  we  obtain 

k-l  m— i  /  j  \  fc_1 

-M(io  +  ij0  +  tTm )  53  “r  (  —  )  *  vj  =  53  M(i0  +  i,j0  +  <rm)  •  br 

r=0  j=0  'vm/  r=0 

where  br  =  «r(j/m )  •  u,  (r  =  0, 1, . . . ,  k  -  1). 

The  number  of  operations  required  to  evaluate  w  in  this  manner  is  O(km).  Indeed, 
evaluating  60,  -  -  - ,  6fc_i  requires  order  O (km)  operations  ( k  coefficients  at  m  operations 
per  coefficient).  Evaluating  the  vector  w  given  the  coefficients  bo, ,  fe^-i  is  also  an  order 
O (km)  procedure  (evaluating  a  fc-term  expansion  at  m  nodes).  For  a  fixed  precision  e, 
the  number  k  of  Chebyshev  nodes  required  is  log3  and  is  independent  of  m.  Thus  the 
cost  of  the  evaluation  of  w  =  Tv  has  been  reduced  from  order  0(m2)  to  O(mlog  i). 

This  example  represents  only  a  part  of  the  computation  required  to  apply  matrix 
Mn  to  an  arbitrary  vector,  since  the  submatrix  T  is  assumed  to  be  separated  from  the 
diagonal.  The  actual  algorithm  is  constructed  by  extending  the  above  example  in  order 
to  apply  the  entire  matrix.  The  matrix  Mn  can  be  divided  into  square  submatrices,  each 
of  which  is  separated  from  the  diagonal  (Figure  1).  To  apply  the  matrix  Mn  to  the  vector 


Figure  1:  Each  submatrix  T  in  the  subdivision  of  Mn  is  separated  from  the  diagonal  (see 
Definition  2).  Here  the  subdivision  is  shown  to  three  levels. 

a,  each  submatrix  T  is  applied  separately,  as  suggested  in  the  example.  The  remaining 
undivided  portion  of  Mn  near  the  diagonal  is  applied  directly.  Before  we  can  introduce 
the  algorithm,  however,  we  will  need  some  additional  notation. 
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4.2  Notation 

The  algorithm  to  be  described  will  input  an  arbitrary  vector  a  =  (a0, . . . ,  an„i)  and 
compute  an  approximation  7  =  (70, . . .  ,7n-i)  to  the  vector  0  =  Mna.  Theorem  1  will 
be  used  to  ensure  the  desired  quality  of  approximation. 

Suppose  that  s  is  a  positive  integer  such  that  h  =  log2(n/s)  —  1  is  also  a  positive 
integer.  For  any  integer  /  £  {1  ,  and  j  €  {0, . . .  ,n/(2,-1s)  —  1},  we  define  the 

interval  Iij  C  R  by  the  formula 

hj  =  b-  2'-^,  (j-hi)-^]. 

For  any  integer  /  €  {1, . . . ,  h},  i  €  {0, . . .  ,n/(2t~1s)  —  3},  and  j  £  {i  +  2,  i  +  3}  (for  i 
even),  j  =  i  +  2  (for  i  odd)  we  define  the  square  /5tiJ  C  Bt2  by  the  formula 

iSij  =  //,,  x  Iij. 

The  definition  of  the  squares  iSij  is  illustrated  in  Figure  2. 


Figure  2:  The  upper  triangle  of  the  square  [0,  n]  x  [0,  n]  is  subdivided  into  squares,  each 
of  which  is  separated  from  the  diagonal  (Definition  1).  Here  the  number  h  of  levels  is 
equal  to  2. 


For  l  €  {1  ,...,/i},  m  £  {0,  ...^  ls  —  l},  and  r  £ 
Chebyshev  interpolation  coefficient  ulr  m  by  the  formula 


{0, . . . ,  k  —  1},  we  define  the 


u 


l 

r,m 


=  Ur 


(38) 
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4 


t 


» 


where  uT  is  given  by  Eq.  (3).  For  each  interval  hi  and  r  £  {0,  ...,&  —  1}  we  define  the 
coefficient  bTt  j  by  the  formula 


Ki  =  E  “t.m  •  (39) 

m=0 

and  observe  that  the  definition  of  is  analogous  to  the  definition  of  bT  in  the  example 
above.  For  each  square  iSij  we  define  a  k  x  fc-matrix  iMij  of  which  each  element 
(r,  m  £  {0, . . . ,  k  —  1})  is  given  by  the  formula 

iMTif  =  M((i  +  tr)  ■  2l~'s,  (j  +  tm)  •  2'“1s).  (40) 

We  further  define,  for  each  square  and  r  £  {0, . . . ,  k  —  1},  the  coefficient  icTt  ]  by  the 
formula 

=  E  'M<T  ■ 

m=0 

For  each  interval  ijj  and  r  €  {0,  —  1},  we  define  the  coefficient  cli  by  the  formulae 


cf.2  i 


_  l«wa  +  U<n/(2‘3)-2) 

c/,2j+l  —  <c2j+1,2j'+3 


cr,n/(2<-l*)-2 


-  Cr,n/(2‘-»»)-l  -  0 


(41) 


For  each  interval  Iij  and  m  6  {0, . . . ,  2‘-1s  — 1}  we  define  the  coefficient  by  the  formula 

tfi  =  E  <m  • 

r=0 

Finally,  for  i  £  {0, . . . ,  n  —  1},  we  define  7,  by  the  formula 


7i  = 


r^(^)l 

E 

/=i 


l«'/*Ja+2»— 1 


M" 


«r, 


(42) 


where  =  L*/(2,-13)J  and  mj,,-  =  imod(2,-ls). 

The  notation  introduced  so  far  allows  us  to  extend  the  example  of  Section  4.1  to  apply 
the  entire  matrix  Mn  to  an  arbitrary  vector.  The  simplest  algorithm,  which  is  described 
in  the  next  section,  requires  order  0(n  log  n)  operations.  We  have  introduced,  however, 
one  change  from  the  example  of  Section  4.1:  the  values  of  M  axe  interpolated  with  respect 
to  both  the  first  and  the  second  coordinates,  rather  them  just  the  second  coordinate.  This 
change  allows  us  to  construct  an  algorithm  requiring  order  O (n)  operations.  This  latter 
algorithm  is  described  in  Section  4.3,  and  will  require  some  additional  notation. 

We  note  that  for  any  integer  l  £  {2, . . . ,  h},  m  £  {0, . . .  ,2/-1s  —  1 } ,  and  r  £  {0, . . . ,  k  — 
1},  the  Chebyshev  interpolation  coefficient  uj.  defined  in  Eq.  (38)  can  be  equivalently 
given  by  the  formula 


(43) 
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Remark  5.  Eq.  (43)  is  an  instance  of  the  more  general  formula 

/t\ 

“*■(<)  —  53  ur  ( -j- )  •  Ui(2t)  for  any  t  €  R.,  (44) 

i=0  \*s 

which  immediately  follows  from  the  observation  that  the  summation  in  Eq.  (44)  is  the 
(k  —  l)Bt-degree  interpolating  polynomial  which  agrees  with  the  function  ur  at  points 
to/2,  ti/2, . . . ,  <*_ i/2,  and  that  ur  itself  is  a  polynomial  of  degree  k  —  1  (see  Eq.  3). 

Eq.  (43)  combined  with  Eq.  (39)  produces  a  recursive  expression  for  b] ■  given  by  the 
formulae 


\,j  —  53  Ur  (  ' )  ‘  a*'+i»  (4^) 

t=0 

-  E  [“r  (|)  • (iiii)  •  (>!.,.„«]  (i£{2 . h}).  (46) 

Following  a  similar  procedure,  for  each  J/j  and  r  6  (0, . . . ,  k  —  l},  we  recursively  define 
the  coefficient  by  the  formulae 


ch,j 

di,2j  =  ci,2j  +  53  (^)  ‘  j  (l  h  -  1})  (47) 


^l,2j+l  ~  Cl,2j+1  +  53  u*  (  2  )  ' 


where  cjj  is  given  by  Eq.  (41).  Finally,  combining  Eqs.  (45),  (46),  and  (47),  we  see  that 
the  vector  7,  defined  by  Eq.  (42),  can  be  equivalently  given  by  the  formula 


fc-i 


«=o 


7 m+j.  =  53  (t)  '  +  S  ■  ai+j>  for  rn  =  0, . . . ,  s  -  1, 

'=m  j  =  0, . . . ,  n/s  —  1. 


2*— 1 


(48) 


4.3  Description  of  an  0(n  log n)  Algorithm 

A  simple  algorithm  for  the  application  of  the  matrix  Mn  to  an  arbitrary  vector  a  = 
(a0,...,an_  1)  can  be  constructed  by  combining  the  procedure  of  Section  4.1  and  the 
subdivision  of  Mn  shown  in  Figures  1  and  2.  The  procedure  of  Section  4.1  works  only 
for  a  submatrix  which  is  separated  from  the  diagonal.  The  matrix  Mn  can  be  divided, 
however,  into  a  collection  of  submatrices,  each  of  which  is  separated  from  the  diagonal, 
plus  an  undivided  part  near  the  diagonal  (see  Figure  1).  By  applying  the  scheme  of 
Section  4.1  to  each  submatrix,  we  immediately  obtain  an  order  O(nlogn)  algorithm  for 
the  application  of  the  matrix  Mn  (or  Ln )  to  a. 


% 


i 
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4.4  Description  of  an  O  (n)  Algorithm 

Now  we  describe  an  order  0(n)  algorithm  for  the  application  of  the  matrix  Mn  (or  matrix 
Ln)  to  an  arbitrary  vector.  The  matrix  Mn  is  divided  into  submatrices,  each  of  which 
is  separated  from  the  diagonal.  The  scheme  of  such  a  subdivision  is  shown  in  Figure  1. 
There  are  3  squares  of  side  length  n/4,  3  •  3  of  side  length  n/8,  3  •  7  of  side  length  n/ 16, 
and  so  forth  down  to  3  •  ( 2h  —  1)  squares  of  side  length  s  =  n/21+fc.  The  size  s  x  s  of 
the  smallest  squares  in  the  subdivision  is  fixed,  independent  of  n.  For  each  square  the 
separation  condition  of  Theorem  1  holds. 

The  most  direct  application  of  Chebyshev  expansions,  to  each  square  independently, 
leads  to  an  order  O(nlogn)  algorithm.  This  operation  count  is  due  to  order  O (n)  oper¬ 
ations  for  all  squares  of  one  size,  multiplied  by  O(logn)  different  sizes.  The  undivided 
part  of  the  matrix  near  the  diagonal  is  handled  directly  in  order  0 (n)  operations. 

We  improve  on  this  simple  method  by  using  in  each  square  the  Chebyshev  expansion 
for  Mn  in  both  row  and  column  directions.  We  then  “gather”  coefficients  used  in  each 
row  interval  up  from  those  for  the  next  smaller  intervals,  compute  k  x  k  matrix-vector 
products,  then  “spread”  the  results  down  to  the  next  smaller  column  intervals.  To  de¬ 
scribe  this  procedure,  we  employ  the  notation  introduced  in  Section  4.2.  The  coefficients 
b[j  are  computed  from  level  1  to  level  h  according  to  Eqs.  (45)  and  (46).  Then  the 
matrix-vector  products  jcjj  are  computed  and  summarized  to  the  values  c\ :  (Eq.  41). 
These  values  are  used  to  compute  the  coefficients  djj  from  level  h  to  level  1,  as  specified 
in  Eq.  (47).  Finally,  the  vector  7  is  computed  according  to  Eq.  (48). 

This  incremental  computation  of  the  coefficients  b] ^  and  the  coefficients  leads  to 
the  reduction  in  complexity  from  order  O(nlogn)  to  0(n)  operations.  Now  instead  of 
order  0 (n)  operations  for  all  squares  of  one  size,  we  expend  order  0(n)  operations  for 
all  squares  of  the  smallest  size,  half  as  many  for  the  next  larger  size,  one  fourth  as  many 
for  the  second  larger  size,  and  so  forth.  The  sum  of  these  operation  counts  remains  order 
°(n)‘ 

We  divide  the  computation  into  an  initialization  phase,  which  is  independent  of  a  = 
(ar0j . .  • ,  «n-i).  depending  only  on  n,  and  an  evaluation  phase,  which  does  the  rest  of  the 
work.  This  partitioning  of  the  algorithm  leads  to  substantial  savings  when  one  computes 
many  Legendre  transformations  of  the  same  size. 

In  the  initialization  phase,  the  values  of  for  each  square,  as  defined  in  Eq.  (40), 

are  computed  and  stored;  the  near-diagonal  values  of  M,  appearing  in  Eq.  (48),  are 
also  computed.  Details  of  an  efficient  scheme  for  the  evaluation  of  M.  are  contained  in 
Appendix  A.  The  values  of  uP  appearing  in  Eqs.  (45)  and  (46)  are  also  computed  during 
the  initialization  phase,  and  later  used  in  the  evaluation  phase.  The  evaluation  phase 
consists  of  using  the  stored  values  of  M.  and  ur  in  computing,  in  succession,  b\  - ,  c\  - ,  df  •, 
and  7 i,  as  described  above. 
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5  Detailed  Description  and  Complexity  Analysis  of 
the  Algorithm 

5.1  Description  of  the  Algorithm 

Initialization  Phase 

Comment  [Input  to  this  phase  is  the  number  of  values  n]. 

Set  the  number  of  Chebyshev  nodes  per  interval  k  «  log(l/e), 
where  e  is  the  desired  precision.  Set  the  smallest  interval 
size  s  «  4k.  (See  Section  5.2,  below.) 

Step  1. 

Comment  [Construct  Chebyshev  nodes  to,i i, . . .  ,tk-i  on  the 
interval  [0,  l],  Chebyshev  nodes  t'Q, . . . ,  t'k_x  on  the  interval  [0,  |], 
and  Chebyshev  nodes  t'k , . . .  on  the  interval  [|,  1].] 

do  r  =  0, . . . ,  k  —  1 

set  tT  =  [l  —  cos((r  +  .5)tt/A:)]/2 
set  t'r  =  tr/2  and  tk+r  =  (1  -f  tr)/2 
enddo 


Step  2. 

Comment  [Evaluate  the  denominators  in  the  expressions  for  the 
Chebyshev  interpolation  coefficients  uo,u\,. . u*_i.] 
do  r  =  0, . . . ,  k  —  1 

set  denr  =  nto^r^r  ~  *i) 

enddo 

Comment  [Evaluate  the  Chebyshev  interpolation  coefficients 
uq,  ttj, . . . ,  ujt-i  at  the  uniformly  spaced  nodes  0,  j,  j, . . . , 
do  /  =  0, . . . ,  s  —  1 

set  x  =  UrZo(l/s  ~  *r) 
do  r  =  0, . . . ,  k  —  1 

set  ur(l/s)  =  [x/(l/s  —  tr)]/denr 

enddo 

enddo 

Comment  [Evaluate  the  Chebyshev  interpolation  coefficients 
uq,  mi,  . . . ,  ujt_ i  at  the  Chebyshev  nodes  t'0, . . . ,  t'2k-\ •] 
do  /  =  0, ...  ,2k  —  1 
set  X  =  rir=o(*/  -  *<■) 
do  r  =  0, . . . ,  k  —  1 

set  =  [z/(</  —  tT)\/denT 

enddo 
enddo 
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Step  3. 

Comment  [Evaluate  the  values  of  the  k  x  k 

matrix  on  each  square  /St  J  of  the  subdivision.] 
h  =  log2(n/s)  -  1 
do  /  =  1, . . .  ,h 

do  t  =  0, . . .  ,n/(2l~1s)  —  3 

do  j  =  i  +  2, . . . ,  i  4-  3  —  i  mod  2 
do  r  =  0, . . . ,  k  —  1 
do  m  =  0, . . . ,  k  —  1 

Calculate  i-M?™  according  to  Eq.  (40). 
enddo 
enddo 
enddo 
enddo 
enddo 


Step  4. 

Comment  [Evaluate  M.  in  the  undivided  part  of  the  matrix, 
near  the  diagonal.] 
do  j  =  0, . . .  ,n/s  —  1 
do  m  =  0, . . . ,  a  —  1 

set  p  =  2s  —  2  m  mod  2 
do  i  =  m,  m  +  2,  m  +  4, . . . ,  p 

Calculate  M(m  +  js,i  +  js)  using  Eq.  (22). 
enddo 
enddo 
enddo 

End  of  Initialization  Phase 


Evaluation  Phase 

Comment  [Input  to  this  phase  is  a  =  (c*o, . . . ,  <*r»-i)-] 

Step  5. 

Comment  [Evaluate  the  coefficients  6J  •  from  the  input 
vector  (ao,  •  •  •  ,an-i)  and  the  interpolation  coefficients  ur(t/s).] 
do  j  =  0, . . .  ,n/s  —  1 
dor  =  0,...,fc-l 

set  b\  j  as  TT’Zo  «r (i/s)  ■  ai+j. 
enddo 
enddo 
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Step  6. 

Comment  [Evaluate  the  coefficients  br,j  for  /  >  2,  which 
correspond  to  larger  interval  sizes,  from  the  coefficients  for 
smaller  interval  sizes  and  the  interpolation  coefficients  ur(t().] 
do  l  =  2, . . . ,  h 

do  j  =  0, . . .  ,  n/(2,-1s)  -  1 
do  r  =  0, . . . ,  k  —  1 

set  b\  -  =  5Zt=0  +  ur(tfc+»)  '  ^/-l,2j+l] 

enddo 

enddo 

enddo 


Step  7. 

Comment  [Evaluate  the  coefficients  CU  from  the 
values  and  the  coefficients  b]j.] 

do  /  =  1, . . . ,  h 

do  j  =  0, . . . ,  n/(2's)  —  2 
do  r  =  0, . . . ,  k  —  1 

Set  CJ,2j  =  £?=  0  ['-^2>,2j+2  ‘  fy,2j+2  +  l^2j,2j+3 
set  C[2j+1  =  S;_o  /^2>+l,2j+3  ’  ^'l,2j+3 

enddo 

enddo 

enddo 


Step  8. 

Comment  [Evaluate  the  coefficients  4j  from  the 
coefficents  c]  -  and  the  interpolation  coefficients  ur(<[).] 
do  /  =  h,  h  —  1, . . . ,  1 

do  j  =  0, . . . ,  n/(2,-1s)  —  2 
do  r  =  0, . . .  ,k  —  1 

set  drlaj  =  CTlaj  +  Ef=0  «i(* r)  •  4+1, j 
set  dj2j+l  —  Cl,2j+1  d"  5Ti=0  U>(tk+r)  ’  4+1, j 
enddo 
enddo 
enddo 


16 


Step  9. 

Comment  [Evaluate  the  final  result  7  =  (70, . . . ,  7n-i)  from 
the  coefficients  the  interpolation  coefficients  ur(m/s),  the 
values  of  M  near  the  diagonal,  and  the  input  vector  <?.] 
do  j  =  0, . . .  ,n/s  —  1 
do  m  =  0, . . . ,  s  —  1 

set  7m+j»  =  £,fc=o  Ui(m/s )  •  d[j 
set  p  =  2s  —  2  +  m  mod  2 
do  i  =  m,  m  +  2,  m  4-  4, . . . ,  p 

set  7m+j*  =  7m+j«  +  X(m  +  js,  i  +  js)  • 
enddo 
enddo 
enddo 

End  of  Evaluation  Phase 
End  of  Algorithm 


The  algorithm  for  the  opposite  direction,  namely,  computing  Legendre  expansion 
coefficients  from  Chebyshev  expansion  coefficients,  is  identical  to  the  algorithm  given 
above,  with  C  substituted  for  M. 

Remark  6.  In  Theorem  1,  error  bounds  for  Chebyshev  expansions  in  a  square  separated 
from  the  diagonal  are  given  for  C(x,  y)/(z  4-  §),  rather  than  for  £(x,  y).  In  principle,  then, 
the  algorithm  for  computing  Legendre  coefficients  from  Chebyshev  coefficients  should 
apply  the  matrix  corresponding  to  C(x,y)/(x  +  5)  using  the  method  given  above  and 
then  apply  the  diagonal  matrix  corresponding  to  x  +  Doing  so  would  produce  an 
algorithm  of  the  same  asymptotic  time  complexity  as  that  for  M,  but  slightly  more 
expensive.  In  practice,  this  alteration  produces  virtually  no  improvement  in  accuracy,  so 
our  implementation  uses  the  same  algorithm  in  both  directions. 

5.2  Complexity  Analysis 

In  the  following  table,  we  provide  the  operation  count  for  each  step  of  the  algorithm. 
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Initialization  Phase 


Step 

Complexity 

Explanation 

1. 

0  (*) 

Chebyshev  nodes  •  •  •  >  tk-i  and  t'0, . . . , 

(a  total  of  3A:  nodes)  are  computed. 

2. 

0 (k2  4-  ks ) 

Interpolation  coefficients  itr(//s),  for  /  =  0, 1, . . . ,  s  —  1 
and  uT(t'i),  for  /  =  0, . . . ,  2k  —  1  are  computed 
(r  =  0, . . . ,  fc  —  1).  The  average  cost  per  coefficient 
is  constant. 

3. 

O  (nk2/s) 

Matrix  entries  are  computed. 

There  are  k2  of  these  values  per  square;  there  are 
|(n/s  —  2)  squares  of  the  smallest  size,  §(n/(2s)  —  2) 
squares  of  the  next  size,  and  so  forth,  up  to  3  squares 
of  the  largest  size.  Thus  the  total  number  of  squares 
is  order  0 (n/s),  at  a  cost  of  order  0(k2)  per  square. 

4. 

O  (ns) 

Matrix  entries  M(m  4-  js,  i  +  js)  are  computed. 

There  are  no  more  than  2s  of  these  values  per  row 
of  the  matrix  Mn,  and  Mn  has  n  rows. 

Total 

O  (n(s  +  k2/s )) 

Evaluation  Phase 

Step 

Complexity 

Explanation 

5. 

0  (nk) 

Coefficients  b\j,  for  j  =  0, 1, . . . ,  n/s  —  1  and 
r  =  0, . . . ,  k  —  1  are  computed,  at  order  O(s) 
operations  each. 

6. 

0(nk2/s) 

Coefficients  for  /  >  1  are  computed. 

There  are  (n/s  —  4)  •  k  of  these,  and  each  requires 
O(fc)  operations  to  compute. 

7. 

0  (nk2/s) 

Coefficients  cTl  }  are  computed.  There  are 

2[n/s  —  log2(n/s)  —  1]  •  k  of  these,  and  each  is 
computed  in  order  0 (k)  operations. 

8. 

0(nk2/s) 

Coefficients  dJJ  are  computed.  Again  there 
are  2[n/s  —  log2(n/s)  —  l]  •  k  c these,  and  each 
is  computed  in  order  O(fc)  operations. 

9. 

0  (nk  +  ns) 

Final  results  7,  are  computed.  There  are  n  of 
these  and  each  requires  order  0(1*  +  s)  operations 
to  compute. 

Total 

0  (n(s  +  k  +  k2/s)) 

18 


The  total  computation  time  for  the  initialization  phase  is 

tinit  =  cti  •  k  +  aj  •  k2  +  a3  *  ks  +  a4  •  nk2/s  +  as  ■  ns 
and  the  time  for  the  evaluation  phase  is 

tev»i  =  oe  •  ns  +  07  •  nk  +  a8  •  nk2/s, 

where  a*,  02, . . . ,  as  depend  on  the  implementation,  language,  and  computer  system.  The 
length  s  of  the  smallest  interval,  however,  is  not  determined  by  the  problem,  and  can  be 
chosen  arbitrarily.  Choosing  s  to  minimize  the  evaluation  time  £evaI  yields 


For  this  choice  of  s,  the  initialization  time  and  evaluation  time  tev« 1  each  are  order 
O(nAr).  The  actual  values  of  k  and  s  used  in  our  implementation  are  given  in  the  next 
section,  which  also  includes  running  times  and  accuracies. 

C  Numerical  Results 

We  have  implemented  the  O(n)  algorithm  described  above.  It  has  been  combined  with 
a  cosine  transform  (order  n  log  n)  to  produce  a  code  converting  coefficients  of  Legendre 
expansions  (of  functions  in  the  interval  f-l,!])  into  values  of  such  functions  at  Chebyshev 
nodes,  and  vice  versa.  In  the  forward  direction,  the  algorithm  transforms  function  values 
at  Chebyshev  nodes  to  Legendre  expansion  coefficients,  while  in  the  backward  direction, 
Legendre  expansion  coefficients  are  converted  to  function  values  at  Chebyshev  nodes. 
Here  we  present  the  results  of  applying  the  algorithm  to  inputs  of  varying  size,  giving 
accuracies  and  running  times.  These  results  are  compared  to  a  direct  calculation. 

The  algorithm  has  been  implemented  both  in  single  precision  and  double  precision 
arithmetic.  The  number  k  of  Chebyshev  nodes  per  interval  was  chosen  to  retain  maximum 
precision.  The  size  s  of  the  smallest  interval  was  chosen  to  maximize  efficiency.  For  the 
single  precision  version,  we  chose  k  to  be  8  and  s  to  be  32.  For  double  precision,  k  is  18 
and  s  is  64. 

We  ran  our  FORTRAN  implementations  on  a  Sun  3/50  equipped  with  an  MC68881 
floating-point  coprocessor.  For  comparison,  times  required  to  compute  a  complex  FFT 
on  this  hardware  are  also  given.  All  accuracies  were  determined  by  comparing  to  results 
computed  in  quadruple  precision  (on  a  microVax),  using  the  direct  method.  The  measure 
of  error  was  chosen  to  be  the  I^-norm: 

E  = 

where  7  is  the  result  of  the  computation  being  tested  and  A  is  the  result  of  the  quadruple 
precision  computation.  Input  for  the  computations  in  the  backward  direction  (converting 
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from  Legendre  coefficients  to  function  values)  was  a  vector  whose  components  were  inde¬ 
pendent  pseudo-random  numbers,  each  drawn  from  a  distribution  uniformly  distributed 
in  the  interval  [0, 1].  Input  for  the  forward  direction  was  the  function  values  from  the 
backward  quadruple  precision  computation. 

Table  1 

Single  Precision  Computations 


Input 

Error 

Time  (sec) 

Size 

Algorithm 

Direct 

Algorithm 

Direct 

(n) 

Forward 

Backward 

Method 

Initial. 

Eval. 

Method 

FFT 

64 

0.335e-06 

0.133e-06 

0.112e-05 

0.92 

0.07 

0.30 

0.02 

128 

0.629e-06 

0.107e-06 

0.290e-05 

2.34 

0.17 

1.16 

0.06 

256 

0.888e-06 

0.117e-06 

0.762e-04 

5.48 

0.47 

4.78 

0.16 

512 

0.130e-05 

0.167e-06 

0.242e-03 

11.60 

1.07 

18.26 

0.32 

1024 

0.236e-05 

0.125e-06 

0.278e-03 

24.24 

2.31 

73.96 

0.72 

2048 

0.285e-05 

0.171e-06 

0.396e-02 

48.76 

4.90 

297.56 

1.66 

4096 

0.503e-05 

0.262e-06 

0.121e-01 

100.18 

9.84 

1168.18 

3.48 

The  direct  computation  of  £  a,  •  P;(t)  was  accomplished  by  applying  the  three-term 
recurrence  relation  for  Legendre  polynomials  (Eq.  19).  This  procedure  produces  sig¬ 
nificant  round-off  errors  for  t  near  —1  and  1,  which  impairs  the  accuracy  of  the  direct 
computation.  It  is  given  here  primarily  for  CPU  time  comparisons. 

Table  2 

Double  Precision  Computations 


Input 

Error 

Time  (sec) 

Size 

Algorithm 

Direct 

Algorithm 

Direct 

(n) 

Forward 

Backward 

Method 

Initial. 

Eval. 

Method 

FFT 

64 

0.152e-14 

0.673e-15 

0.756e-15 

1.86 

0.11 

0.30 

0.02 

128 

0.201e-14 

0.695e-15 

0.209e-14 

4.56 

0.28 

1.18 

0.08 

256 

0.312e-14 

0.723e-15 

0.134e-13 

10.80 

0.71 

4.80 

0.16 

512 

0.495e-14 

0.725e-15 

0.127e-12 

24.32 

1.96 

19.22 

0.36 

1024 

0.689e-14 

0.768e-15 

0.455e-12 

52.30 

4.62 

76.92 

0.78 

2048 

0.908e-14 

0.787e-15 

0.827e-12 

108.50 

10.12 

307.40 

1.76 

4096 

0.139e-13 

0.840e-15 

0.716e-ll 

223.00 

21.38 

1229.28 

3.70 

Several  observations  may  be  made  from  Tables  1  and  2. 

1.  The  algorithm  permits  very  high  accuracy.  In  single  precision  the  relative  error  is 
less  than  6  x  10-6  for  all  input  sizes  up  to  4096.  For  double  precision,  the  relative 
error  is  less  than  2  x  10-14. 

2.  The  error  of  the  algorithm  increases  roughly  as  the  square  root  of  the  length  of  the 
input  vector. 
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3.  The  time  required  to  make  the  single  precision  computation  is  less  than  3  times 
that  for  an  FFT  of  the  same  size.  For  double  precision,  the  ratio  is  about  5.5  for 
n  in  the  range  we  have  tabulated.  (For  larger  n,  the  ratio  would  be  less  than  5.5.) 
The  initialization  time  is  roughly  10  times  the  computation  time,  for  both  single 
and  double  precision. 

4.  In  single  precision  computations,  for  n  as  low  as  64,  the  time  required  by  the 
algorithm  is  less  than  one-fourth  of  the  time  required  by  the  direct  method.  For 
n  —  4096,  the  speedup  is  120-fold.  For  double  precision,  the  speedups  are  roughly 
3-fold  at  n  =  64  and  60-fold  at  n  =  4096. 


7  Generalizations  and  Conclusions 


An  algorithm  has  been  presented  for  the  rapid  conversion  of  the  coefficients  of  a  Legendre 
expansion  of  a  function  on  the  interval  [—1,1]  into  its  values  at  the  Chebyshev  nodes 
on  that  interval,  and  vice  versa.  The  algorithm  requires  order  O(nlogn)  operations  to 
transform  an  n-term  expansion  into  n  function  values,  or  n  function  values  into  an  n-term 
expansion. 

The  method  admits  several  straightforward  generalizations. 

1.  As  described  in  Section  4.4,  the  scheme  requires  that  n  (the  number  of  Chebyshev 
nodes  on  the  interval  [—1,1],  and  also  the  length  of  the  Legendre  expansion)  be  a 
power  of  2.  Clearly,  this  is  not  an  essential  requirement.  A  simple  modification  of 
the  scheme  will  convert  an  n-term  Chebyshev  expansion  into  an  n-term  Legendre 
expansion,  and  vice-versa,  for  any  positive  integer  n.  On  the  other  hand,  the  fast 
Fourier  transform  used  to  evaluate  the  Chebyshev  series  at  the  Chebyshev  nodes 
does  impose  certain  algebraic  requirements  on  n.  Thus,  the  scheme  of  this  paper 
will  perform  efficiently  whenever  the  FFT  does. 

2.  Clearly,  the  problem  solved  by  the  algorithm  of  this  paper  is  a  particular  case  of  the 
following  problem,  regularly  encountered  in  the  numerical  treatment  of  equations 
of  mathematical  physics. 


Problem  1.  Given  the  coefficients  onm  for  n  =  0, 1, . . . ,  N  —  1,  and  m  =  — n,  — n  + 
1, . . . ,  n,  evaluate  the  expression 

/(*.*)  »E  E  CM  »)•«*"*,  (49) 


at  the  points  (0i><Pj)i  for  =  0, 1, . . . , N  —  1,  defined  by  the  formulae 


6>  = 


2  i  +  1 


(In  Eq.  (49),  P™  denotes  the  associated  Legendre  function  of  degree  n  and  order 
m,  as  defined  in,  e.g.,  [1].) 
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The  algorithm  of  this  paper  admits  a  generalization  that  solves  Problem  1  in  order 
O (N2  log  N))  operations,  and  the  paper  describing  such  a  scheme  is  in  preparation. 

3.  The  heart  of  this  paper  is  an  order  0(n)  algorithm  converting  an  n-term  Legendre 
expansion  of  a  function  into  an  n-term  Chebyshev  expansion  of  that  function,  and 
vice-versa.  The  algorithm  is  based  on  the  fact  that  elements  of  the  matrices  Mn, 
Ln  connecting  the  two  expansions  are  a  smooth  function  of  the  indices,  except  near 
the  diagonal.  In  this  respect,  the  scheme  of  this  paper  is  somewhat  similar  to  those 
of  [9],  [6],  and  several  others.  A  radical  generalization  of  this  approach  is  to  observe 
that  any  matrix  whose  elements  are  a  sufficiently  smooth  non-oscillatory  function  of 
their  indices  can  be  applied  to  an  arbitrary  vector  for  a  cost  proportional  to  n  (with 
a  fixed  precision).  As  the  matrix  becomes  less  smooth,  the  procedure  becomes  less 
efficient.  For  matrices  singular  near  the  diagonal  (such  as  the  matrices  Mn,  Ln  of 
this  paper),  the  scheme  is  still  of  order  O (n).  The  same  is  true  for  matrices  with 
singularities  along  a  fixed  number  of  bands  of  fixed  width.  It  is  easy  to  construct 
examples  of  matrices  for  which  the  method  will  be  of  order  O(nlogn),  and  other 
examples  for  which  the  method  will  fail  completely.  An  investigation  of  these  issues 
is  in  progress,  and  the  authors  are  aware  of  at  least  one  other  such  algorithm  [2]. 
This  latter  scheme,  however,  is  based  on  a  somewhat  different  set  of  techniques. 
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Appendices 


A  Numerical  Evaluation  of  the  Function  A 


The  definitions  of  the  functions  M. ,  C  (Eqs.  22  and  23)  involve  the  function  A,  defined 
as 


A(2) 


r(«  + 1/2) 
r(*  +  i)  ’ 


for  z  €  C.  The  algorithm,  described  in  Sections  4  and  5,  actually  requires  computation  of 
A(x)  for  x  €  R+.  The  function  A  may  of  course  be  computed  using  the  Sterling  asymptotic 
expansion  for  T,  but  for  an  efficient  computation  we  use  the  asymptotic  formula 


A(x)  ~ 


as  x  —►  oo. 


(50) 


For  large  values  of  x,  therefore,  the  function  A(x)y/x  may  be  well  approximated  by  a 
polynomial  in  \/x.  Defining  the  change  of  variable  y  =  1/x,  we  obtain  the  approximate 
formula 

A  (±) 

— yr-  «  oo  +  aiy  H - h  asys, 

where  the  coefficients  ao,  , . . . ,  a5  have  been  determined  by  evaluating  the  function 
A(l/y)/v/y  at  Chebyshev  nodes  on  each  of  four  intervals.  The  values  of  the  coefficients 
are  shown  in  Table  A.l. 


A(i' 

Coefficient 

Table  A.l 

Coefficients  cto,  Oj, . . . ,  as  in  the  formula 
)  53  \/V  [«o  +  Oiy  +  •••-(-  a5y5),  for  four  separate  intervals 
y  €  (0,  .02)  y  6  (.02,  .04) 

ao 

0.99999999999999999d+  00 

0.99999999999974725d  +  00 

— 0.12499999999996888d  4-  00 

— 0.12499999994706490d  +  00 

a2 

0.78124999819509772d  -  02 

0.78124954632722315d  -  02 

03 

0.48828163023526451d  -  02 

0.48830152125039076d  -  02 

a4 

— 0.64122689844951054d  -  03 

-0.64579205161 159155d  -  03 

a  s 

— 0. 15070098356496836d  -  02 

— 0.14628616278637035d  -  02 

Coefficient 

y  e  (.04,  .058) 

y  €  (.058,  .067) 

Oo 

0.99999999999298844d +  00 

0.99999999996378690d  +  00 

0\ 

-0.12499999914033463d  +  00 

— 0.12499999657282661d  +  00 

a2 

0. 781245654471 11342d  -  02 

0.78123659464717666d  -  02 

03 

0.48839648427432678d  -  02 

0.4885568591 1602214d  -  02 

a< 

-0.65752249058053233d  -  03 

—0.67176366234107532 d  -  03 

05 

— 0.14041419931494052d  -  02 

-0. 13533949520771 154d  -  02 
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For  small  values  of  x,  formula  (50)  is  no  longer  useful.  Fortunately,  however,  the 
evaluation  of  A  at  small  values  of  x,  which  corresponds  to  the  evaluation  of  M  and  C 
near  the  diagonal,  is  necessary  only  at  half-integer  values  of  x.  In  particular,  it  suffices  to 
obtain  A  at  values  x  =  0,  .5, 1.0, 1.5, . .  .,(s  —  3)/2  and  x  €  [s/2  -  l,oo),  where  s  is  the  size 
of  the  smallest  interval,  as  defined  in  Section  4.2.  For  our  implementation,  s  >  32  (see 
Section  6).  Table  A.l  shows  the  coefficients  from  the  division  of  the  interval  (0, 1/15)  for 
y  into  four  subintervals.  The  subintervals  were  chosen  so  as  to  limit  the  approximation’s 
relative  error  to  2  x  10-15.  The  coefficient  values  were  computed  using  quadruple  precision 
arithmetic. 

Table  A. 2  contains  the  tabulation  of  A  at  the  half-integer  values  up  to  14.5. 


Table  A. 2 


Values  of  A( 

x  A(x) 

0  0.17724538509055159c?  +  01 

1  0.88622692545275794 d  +  00 

2  0.66467019408956851<?  +  00 

3  0.55389182840797380c?  +  00 

4  0.48465534985697706c?  +  00 

5  0.43618981487127934c?  +  00 

6  0.39984066363200604c?  +  00 

7  0.37128061622971992c?  +  00 

8  0. 34807557771536241c?  +  00 

9  0.32873804562006448c? +00 

10  0.31230114333906123c?  +  00 

11  0.29810563682364938c?  +  00 

12  0.28568456862266400c?  +  00 

13  0.27469670059871537c?  +  00 

14  0. 26488610414876124c?  +  00 


)  for  Small  x 

x  A(x) 

0.5  0.11283791670955126c?  +  01 

1.5  0. 75225277806367508c?  +  00 

2.5  0.60180222245094006c?  +  00 

3.5  0.51583047638652002c? +00 

4.5  0.45851597901023999c? +00 

5.5  0.416832708191 12728c? +00 

6.5  0.38476865371488672c?  +  00 

7.5  0.35911741013389425c?  +  00 

8.5  0.33799285659660638c? +00 

9.5  0.32020375888099550c?  +  00 

10.5  0.30495596083904336c?  +  00 

11.5  0.29169700601995452c? +00 

12.5  0.28002912577915634<i  +  00 

13.5  0.26965767667622464 d  +  00 

14.5  0.260359136101 18244c? +00 
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B  An  Alternative  Approach 


In  this  appendix  we  will  briefly  describe  an  alternative  approach  to  convert  between 
Legendre  and  Chebyshev  expansions.  The  algorithm  we  described  above  is  designed  to 
cope  with  the  singularities  in  M,  £  where  the  two  arguments  are  nearly  equal.  The 
definitions  of  M,  £  (Eqs.  23  and  22)  reveal  that  each  is  roughly  the  product  of  two 
lambda’s,  one  a  function  of  the  sum  of  the  arguments,  another  of  the  difference.  The 
singular  behavior  near  the  diagonal  is  due  to  the  function  of  the  difference.  Our  second 
algorithm  is  based  on  “factoring  out”  this  component. 

We  denote  by  Tn,  Sn  the  pair  of  n  x  n-matrices  defined  by  the  formulae 


rrm 

•i 


5” 


( 

{ 


A(j  —  i)  if  0  <  i  <  j  <  n 
0  otherwise 

i^)AU  -i-  1)  if  0  <  i  <  j  <  n 
0  otherwise, 


(51) 

(52) 


with  A  defined  by  Eq.  (10).  We  note  that  the  entries  along  a  forward  diagonal  of  T"  or 
Sn  are  all  equal,  and  it  is  not  difficult  to  show  that  the  matrices  Xn  and  Sn  are  inverses 
of  each  other.  We  may  therefore  write 


M”  =  Tn(SnMn) 

Ln  =  Sn(TnLn), 

where  the  matrix  products  in  parentheses  are  both  extremely  smooth  everywhere.  We 
will  not  prove  this  assertion  here;  we  will  instead  describe  how  this  fact  may  be  used  to 
create  our  alternative  algorithm. 

The  task  of  applying  the  transformation  Mn  is  reduced  to  applying  Rn  =  SnMn  then 
applying  Tn  to  the  result.  The  latter  task  may  be  accomplished  by  use  of  the  fast  Fourier 
transform  for  the  application  of  Tn.  Thus  our  task  is  reduced  to  applying  Rn.  The  key 
to  applying  Rn  is  that  an  entire  column  of  the  matrix,  excluding  the  diagonal  element, 
may  be  well-approximated  by  a  polynomial  of  small  degree.  In  fact,  the  degree  does  not 
depend  on  the  column  number.  This  observation  leads  to  a  very  simple  algorithm. 

We  introduce  the  notation  r/j  to  denote  the  /th  polynomial  coefficient  for  column  j, 

i.e., 

k 

ss  ^2  riji‘  for  j  =  1,2,...  and  i  =  0, . . . ,  j  —  1.  (53) 

1=0 

Here  we  use  k  to  denote  the  degree  of  the  polynomial  sufficient  to  maintain  a  given  level 
of  precision  (k  =  5  gives  single  precision  and  k  =  17  gives  double  precision).  Given  the 
matrix  size  n,  we  define 

n— 1 

qij  =  ^2  ri'jOtj  for  i  =  1, . . . ,  n  —  1  and  l  =  0, . . . ,  k.  (54) 

i=« 
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We  want  to  compute  7  =  Rna,  where  a  =  (a0,  •  •  • ,  on- 1).  Since  Rn  =  SnMn  is  upper- 
triangular,  7 ;  (t  =  0, . . .  ,n  —  1)  satisfies  the  formula  ‘ 

H  =  E  (55)  * 

j=* 

Combining  Eq.  (55)  with  (53),  changing  the  order  of  summation,  and  using  the  expression 
for  qiti  in  Eq.  (54),  yields 


n— 1 

7,*x; 

(  k 

En,/ 

|“j  =  E>'l 

(!>,«;) 

(56) 

j-i 

V/=o  / 

1=0 

\i=«  / 

1=0 

It  should  now  be  clear  that  all  qiti  (/  =  0, . . . ,  k  and  i  =  0, . . . ,  n  —  1)  can  be  computed 
from  the  r/j  in  order  0(nk)  operations  and  then  7  =  (70, . . . ,  7n-i)  in  an  additional  O(nk) 
operations.  The  degree  of  the  polynomials,  k,  depends  only  on  the  precision  required; 
with  the  precision  specified,  the  total  time  complexity  needed  to  apply  Rn  is  order  O(n). 

The  algorithm  may  be  used,  without  modification,  for  the  application  of  the  matrix 
qu  __  ynjr  n  ^  arbitrary  vector  of  length  n. 

An  issue  arises  in  computing  the  polynomial  coefficients  rjtj.  This  computation  is 
complicated  by  the  form  of  the  definition  of  Rn.  Each  matrix  element  is  expressed  as  a 
sum: 

flS  =  E^. 

Jssi 

Whereas  each  term  in  the  sum  may  be  computed  efficiently,  the  sum  itself  may  contain 
order  O(n)  terms.  On  the  face  of  it,  the  computation  of  the  elements  of  Rn,  and  therefore 
the  polynomial  coefficients  r/j,  is  prohibitively  time-consuming.  We  solve  this  problem  by 
computing  instead  a  corresponding  integral,  using  Gauss  quadrature,  then  determining 
the  sum  by  using  the  Euler-Maclaurin  summation  formula.  This  method  produces  an 
O(logn)  algorithm  for  computing  each  coefficient  rjj.  We  omit  further  details  of  this 
algorithm. 
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